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Abstract 

Semi-mangroves form a group of transitional species between glycophytes and halophytes, and hold 
unique potential for learning molecular mechanisms underlying plant salt tolerance. Millettia pinnata 
is a semi-mangrove plant that can survive a wide range of saline conditions in the absence of specialized 
morphological and physiological traits. By employing the lllumina sequencing platform, we generated 
~192 million short reads from four cDNA libraries of M. pinnata and processed them into 108 598 uni- 
sequences with a high depth of coverage. The mean length and total length of these unisequences were 
606 bp and 65.8 Mb, respectively. A total of 54 596 (50.3%) unisequences were assigned Nr annotations. 
Functional classification revealed the involvement of unisequences in various biological processes related 
to metabolism and environmental adaptation. We identified 23 81 5 candidate salt-responsive genes with 
significantly differential expression under seawater and freshwater treatments. Based on the reverse tran- 
scription -polymerase chain reaction (RT-PCR) and real-time PCR analyses, we verified the changes in 
expression levels for a number of candidate genes. The functional enrichment analyses for the candidate 
genes showed tissue-specific patterns of transcriptome remodelling upon salt stress in the roots and the 
leaves. The transcriptome of M. pinnata will provide valuable gene resources for future application in crop 
improvement. In addition, this study sets a good example for large-scale identification of salt-responsive 
genes in non-model organisms using the sequencing-based approach. 
Key words: transcriptome characterization; lllumina sequencing; salt-responsive genes 



1. Introduction 

High salinity is one of the most severe environmen- 
tal stresses causing crop yield reduction. The ability of 
plants to develop salt tolerance is a trait governed by 
multigenes and is a determining factor in their 
growth and productivity. 1 Among the intertwining 
physiological and molecular mechanisms, the 
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regulation of gene expression plays a pivotal role in 
plant salt tolerance, 2,3 and it has therefore stimulated 
increasing efforts on global expression profiling for 
various plant species over the past decade. 

The early insights into transcriptome responses 
were primarily obtained from the glycophytic model 
plants and major crops such as Arabidopsis, 4,5 
rice, 6,7 and maize, 8,9 in which hundreds of potential 
salt-responsive genes were identified and analysed 
to determine their expression patterns under salt 
treatments. The studies on glycophytes also revealed 
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the involvement of several signal transduction path- 
ways in salt stress response, including the salt overly 
sensitive (SOS) pathway, the ABA signalling pathway, 
the ethylene signalling pathway, and the Ca 2+ -de- 
pendent signalling pathway. 1 0-1 3 These pathways 
constitute a dynamic network and work collectively 
to combat ionic disequilibrium, osmotic imbalance, 
and other deleterious effects of salt stress. However, 
most glycophytes can only endure relatively low con- 
centrations of salt. For example, Arabidopsis cannot 
survive NaCl concentrations higher than 200 mM. 14 
Therefore, the transcriptome analyses of these glyco- 
phytes would miss the information about some key 
genes or pathways that are related to salt tolerance. 

To broaden the knowledge on the regulatory 
network for plant salt tolerance, the focus of recent 
transcriptomic research has been gradually shifting 
from the salt-sensitive glycophytes to the salt-tolerant 
halophytes. Salt cress (Thellungiella halophila) is the 
most commonly surveyed halophyte because its 
high nucleotide sequence identity with Arabidopsis 
cDNA facilitates the use of molecular and genetic 
tools that are available for Arabidopsis. Comparisons 
of the expression profiles between these two species 
demonstrated that they possessed both shared and 
divergent responses to salt stress, and the constitutive 
overexpression of many genes that were stress- 
inducible in Arabidopsis might contribute to the 
stress tolerance of salt cress. 15,16 These findings are 
partly in accordance with the hypothesis that salt 
tolerance mechanisms are largely conserved in halo- 
phytes and glycophytes, and that the large variations 
in their tolerance arise from subtle differences in the 
regulation of the same basic set of genes. 17 In other 
words, the formation of salt tolerance is a cumulative 
process with stepwise changes in gene regulation. 
Therefore, the salt-induced response of the transition- 
al species between glycophytes and halophytes may 
provide new insights into the acquisition of salt toler- 
ance. Nevertheless, the transcriptomic profiles of such 
transitional species are still not documented. 

Millettia pinnata is a semi-mangrove (also called 
'mangrove associate') that can grow in either fresh- 
water or seawater habitats. A recent study regarding 
the differentiation between true mangroves and 
semi-mangroves based on their leaf traits and salt 
contents proposed M. pinnata to be a glycophyte 
with moderate salt tolerance. 18 Unlike other man- 
grove species, M. pinnata does not exclude salt via 
ultra-filtration system at its root surface, excrete salt 
through glands on its leaf surface, or develop succu- 
lence leaf. However, this plant can still endure salinity 
up to 3%. 19,20 In the absence of specialized morpho- 
logical and physiological traits, the adaptation of M. 
pinnata may be more exclusively attributed to the al- 
ternation in gene regulation, which makes M. pinnata 



extremely attractive for investigating on its transcrip- 
tomic adjustments under salt stress. 

As an outbreeding diploid (2n = 2x = 22) legume, 
M. pinnata has high levels of seed oil and has received 
much attention on its applications in biodiesel. 21 ~ 23 
In addition, its biomedical properties have also been 
investigated because it was traditionally known as a 
medicinal plant. 24,25 However, the genetic back- 
ground of this non-model species remains largely 
unknown. To date, only a few genes, such as the 
phytochrome A (PHYA) gene, the maturase (matR) 
gene, and the ribosomal RNA gene, have been 
sequenced from M. pinnata, and none of these 
genes have been shown to be directly responsible 
for its salt tolerance. 26-28 This situation might now 
be changed with the introduction of next-generation 
massively parallel sequencing technologies developed 
by Roche/454 Life Sciences and lllumina. 29,30 These 
sequencing-based approaches do not rely on the 
prior knowledge of genomic sequences, hence they 
are particularly useful in quantifying transcriptomes 
for non-model organisms. 31 

In this study, we initiated a comprehensive tran- 
scriptome analysis of M. pinnata based on the 
lllumina sequencing platform. Our goals were to (i) 
set up a data set with abundant transcript sequences 
from the root and leaf tissues of M. pinnata, (ii) screen 
out genes with differential expression under seawater 
and freshwater treatments as candidate salt-respon- 
sive genes, and (iii) generalize the common and 
tissue-specific patterns of transcriptomic responses 
to salt stress in M. pinnata. The characterization of 
the transcriptome changes in this transitional 
species may provide useful complements to the 
understandings of molecular mechanisms underlying 
plant salt tolerance. The candidate genes identified in 
this study can be applied to molecular breeding for 
plants with enhanced salt tolerance. 

2. Materials and methods 

2.1 . Plant growth and salt treatments 

The seeds of M. pinnata were obtained from the 
Dongzhaigang National Nature Reserve in Hainan, 
China. The field-collected seeds were surface- 
sterilized with a 75% ethanol solution, rinsed with 
sterilized distilled water, and sown in sterilized cover 
soil for germination. The seedlings were grown in 
plastic pots under a 12/12 h photoperiod at 25°C 
(day) and 1 8°C (night). The volume of each plant 
pot is ~800 ml. For the salt treatment, a group of 
3-week-old plants were watered by seawater with a 
salinity of 3% (w/v, ~500 mM NaCl). The treatments 
were initiated at 8:00 a.m., at which time each indi- 
vidual plant was watered with 300 ml of seawater to 
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field capacity. Meanwhile, a group of control plants 
were watered with an equal amount of freshwater 
(distilled water). 

2.2. Sample collection and RNA preparation 

The roots and leaves were sampled at 2, 4, and 8 h 
after the application of seawater or freshwater. At 
each time point, the primary root with some lateral 
roots and three leaves at the shoot apex were simul- 
taneously collected from each individual plant, and 
were separately frozen in liquid nitrogen and stored 
at -80°C prior to RNA extraction. The total RNA 
was isolated using TRIzol reagent (Invitrogen) follow- 
ing the manufacturer's instructions. The quality of 
RNAs was determined with a 2100 Bioanalyzer 
RNA Nanochip (Agilent Technologies). For each 
sample, at least 20 |xg of total RNAs was sent to the 
Beijing Genomics Institute for lllumina sequencing 
(commercial service). For the reverse transcription- 
polymerase chain reaction (RT-PCR) and real-time 
PCR experiments, the total RNAs were prepared 
following the aforementioned procedures. 

2.3. Sequencing and assembly 

For a given tissue, equal amounts of total RNAs from 
the plants treated for 2, 4, and 8 h were pooled in a 
combined sample. Totally, four combined samples 
were collected and denoted as MpRs, MpRf, MpLs, 
and MpLf according to the tissues (Root or Leaf) and 
the treatments (seawater or/reshwater) of their sam- 
pling sources. For each combined sample, mRNAs 
were purified using oligo(dT)-attached beads and 
fragmented into small pieces (100-400 bp). The 
cleaved RNA fragments were then primed with 
random hexamers and submitted to the synthesis of 
the first-strand and second-strand cDNAs. The synthe- 
sized cDNAs were ligated with paired-end adaptors. 
Then, the cDNA fragments with 200 bp (±20 bp) 
size were selected by agarose gel electrophoresis and 
enriched by PCR amplification. Finally, four cDNA li- 
braries were constructed for sequencing on an 
lllumina GA llx. All sequence data have been deposited 
in the Short Read Archive (SRA) at the NCBI database 
under the project accession number SRA046342.1 . 

After sequencing, the raw sequence data were 
first purified by trimming adapter sequences and 
removing low-quality sequences. The resulting 
clean reads were assembled using the SOAPdenovo 
program (version 1.04; http://soap.genomics.org.cn/ 
soapdenovo.html) with K-mer size set to 29 bp. 32 
The clean reads were split into smaller pieces, the 
K-mers, and were conjoined into contigs using the 
de Bruijn graph. Next, the contigs from the same tran- 
script were identified by paired-end reads and were 
connected into scaffolds. Then the gap fillings were 



carried out using the pair-end information to retrieve 
read pairs with one read well aligned on the contigs 
and another read located in the gap region. The 
resulting scaffolds with least Ns were defined as uni- 
genes. The unigenes assembled by short reads from 
four samples were further clustered into all-unigenes 
in a comprehensive transcriptome using TGI 
Clustering Tool (TGICL) with the parameters of 
50 bp overlap and a minimum of 90% identity. 33 

2.4. Data analysis 

The sequence orientations of the all-unigenes 
were determined by BLASTx against the NCBI non- 
redundant (Nr) protein database, the Swiss-Prot 
protein database, the Kyoto Encyclopedia of Genes 
and Genomes (KEGG) pathway database, and the 
Cluster of Orthologous Groups (COG) database. The 
incongruent results from different databases were 
settled under a priority order of Nr, Swiss-Prot, 
KEGG, and COG. For the rest all-unigenes that were 
unaligned to the above databases, ESTScan was used 
to predict their sequence orientations. 34 

For assignments of gene descriptions, the all- 
unigenes were searched against the Nr database 
using BLASTx with an £-value cut-off of 1.0e-5. The 
BLAST results were parsed by a Perl script (available 
on request). The results of only the best hit were 
extracted. In practice, the £-values for more than 
85% of the annotated sequences were <1.0e-10. 
Based on their Nr annotations, the all-unigenes were 
assigned GO annotations using Blast2GO (http:// 
www.blast2go.rog/), 35 followed by functional classifi- 
cation using the WEGO software. 36 Besides, the puta- 
tive metabolic pathways for the all-unigenes were 
assigned by performing BLASTx against the KEGG 
pathway database with the £-value cut-off of 1 .Oe-5. 

The RPKM (reads per kilobase per million reads) 
values were applied to measure the gene expression 
levels. 37 For a given all-unigene, four RPKM values 
were generated by mapping the reads from the four 
subtranscriptomes to it using SOAP2 with a 
maximum of three mismatches. 38 Only the reads 
mapped to a single all-unigene were used to estimate 
the expression level. The differentially expressed genes 
(DEGs) between the seawater- and freshwater-treated 
samples were identified using an R package named 
'DEGseq'. 39 The GO enrichment analysis and KEGG 
pathway enrichment analysis for the DEGs were 
both performed by conducting hypergeometric tests 
with the whole M. pinnata transcriptome set as the 
background. 

2.5. RT-PCR and real-time PCR analyses 

We conducted RT-PCR for 100 DEGs using the 
M. pinnata actin and 1 8S rRNA genes as the controls. 
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The first-strand cDNAs were synthesized from 1 |xg of 
total RNAs using the PrimeScript® 1st Strand cDNA 
Synthesis Kit (TAKARA). The gene-specific primers 
were designed based on the sequencing results using 
the Primer Premier software (version 6.0). Then, the 
20 |xl of PCR samples containing 1 jul I of first-strand 
cDNAs and 1 |xl of primers were subjected to 35 
cycles of 30 s denaturing at 94°C, 30 s annealing at 
the temperature that was altered for different DEGs, 
and 30 s extending at 72°C. The PCR products were 
electrophoresed on a 1 % agarose gel. 

The real-time PCR was performed on an ABI PRISM 
7300 Sequence Detection System (Applied 
Biosystems) using 1 julI of first-strand cDNAs and 
SYBR® Premix Ex Taq™ (TAKARA). The thermal 
cycling conditions were 30 s at 95°C, followed by 40 
cycles of 5 s at 9 5°C and 3 1 s at 60°C. All of the reac- 
tions were performed in triplicate. The relative expres- 
sion levels for each gene were calculated using the 
2"Aact me th oc | w it h normalization to the internal 
control gene. The LSD-t test of one-way analysis of 
variance was performed to determine the significant 
differences in expression levels between the treated 
samples at 1, 2, 4, and 8 h and those at 0 h using 
the SPSS software package (version 1 3.0). 



3. Results 

3. 7. Illumina sequencing and de novo assembly 

Our preliminary experiments showed that the 
young leaves of the M. pinnata plants treated with sea- 
water began to wilt at 2 h, wilted to the greatest 
extent at 4 h, and almost completely recovered by 
8 h after the treatments. Meanwhile, the control 
plants showed no symptoms during the course of 
the freshwater treatments. This result indicates that 
M. pinnata may suffer severe salt stress at 4 h, but it 
may overcome this stress by 8 h. We performed 
high-throughput sequencing of four samples, i.e. 
MpRs and MpRf from the roots, MpLs and MpLf from 
the leaves. For a broad survey of the salt-responsive 
genes, a combined cDNA library was constructed for 
each sample, which was represented by a transcript 
mixture with equal amounts of total RNA from 2, 4, 
and 8 h-treated plants. The Illumina sequencing was 
then performed separately for four cDNA libraries 
and generated four sub-transcriptomes with 7 5-bp 
raw reads. After filtration of low-quality and adapter 
sequences, nearly 192 million clean reads were 
remained for the four sub-transcriptomes. The per- 
centage of Q2 0 bases for the clean reads in the four 
sub-transcriptomes were all ~93% (Supplementary 
Table S1). In total, these clean reads constitute 
~ 1 2 GB of sequence data. 
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The clean reads from the MpRs, MpRf, MpLs, and 
MpLf sub-transcriptomes were assembled into 
248 01 1, 238 722, 1 94 806, and 1 94 020 contigs, 
respectively (Supplementary Table S2). The average 
contig sizes of the four sub-transcriptomes were all 
above 200 bp. The following step used the paired- 
end reads to identify the contigs from the same tran- 
script, and these contigs were merged into 1 30 027, 
1 27 341, 1 01 73 7, and 104 087 scaffolds with an 
average size over 3 50 bp (Supplementary Table S2). 
The scaffolds were further processed by a gap-filling 
step, which resulted in 78 062, 78 165, 63 081, 
and 64 436 unigenes for MpRs, MpRf, MpLs, and 
MpLf, respectively (Fig. 1). The numbers of 
unigenes obtained from the root samples were 
larger than those from the leaf samples, whereas the 
mean lengths of the unigenes were longer in the 
latter samples than those in the former ones 
(Table 1 ). 

Under the clustering criteria with a minimum of 
50-bp overlap and 90% identity, the unigenes from 
all four sub-transcriptomes were further clustered 
into 108 598 all-unigenes to form a comprehensive 
transcriptome of M. pinnata (Fig. 1). The gene 
number showed a substantial increase from the 
sub-transcriptome to the comprehensive transcrip- 
tome, and the increase was more prominent for 
the genes with larger sizes (Table 1). The length of 
the all-unigenes ranged from 200 to 10 843 bp, 
with the mean length and the N50 value rises to 
606 and 887 bp, respectively. As a result of the 
increase in both gene number and gene length, the 
total length of the all-unigenes extended to 
65.8 Mb after clustering (Table 1 ). A total of 49 836 
(45.9%) all-unigenes were clustered by at least 
two unigenes, with a maximum of 31 unigenes per 
all-unigenes, whereas the rest 58 762 (54.1%) were 
corresponding to single unigene from any of the four 
sub-transcriptomes. The average sequencing depth 
calculated by realigning all usable sequencing reads 
to the all-unigenes were ~30-folds for each sample. 
The quality of these all-unigenes was evaluated by 
two analyses: (i) the random distribution of reads in 
the all-unigenes showed that the all-unigenes were 
evenly covered by the reads from each sub-transcrip- 
tome with relatively fewer reads in the 3' ends of 
them (Supplementary Fig. S1), and (ii) the gap distri- 
bution of the all-unigenes indicated that the majority 
of them displayed a ratio of gap length to gene length 
<5% (Supplementary Fig. S2). 

3.2. Gene annotation and functional classification 

First, the all-unigenes of M. pinnata were assigned 
putative gene descriptions based on the BLAST 
search against the NCBI non-redundant (Nr) protein 
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Figure 1. Flowcharts of the transcriptome analyses for M. pinnata. The steps include sequence assembly and clustering, Nr annotation, 
GO annotation, KEGG analysis, and identification of DEGs as candidate salt-responsive genes. 





Table 1. 


Statistics for the uni. 


;enes of M. pinnata 






Length of unigenes (bp) 


Number of unigenes 








MpRs 


MpRf 


MpLs 


MpLf 


All 


200-500 


54 61 7 


53 061 


41 787 


41 700 


70455 


500-1 000 


1 6 231 


16 819 


1 3 930 


14 539 


21 028 


1 000-1 500 


4403 


4930 


4348 


4604 


81 84 


1 500-2000 


1 677 


1914 


1 800 


2036 


4356 


>2000 


1 1 34 


1441 


1216 


1 557 


4575 


Total 


78 062 


78 165 


63 081 


64436 


1 08 598 


Mean length (bp) 


497 


521 


537 


558 


606 


N50 (bp) 


589 


634 


674 


708 


887 


Total length (bp) 


38 794 023 


40 744 474 


33 850 579 


35 949 682 


65 81 4 388 



MpRs, MpRf, MpLs, and MpLf represent four sub-transcriptomes that were derived from two tissues (Root or Leaf ) under two 
treatments (seawater or /res h water). The last column presents the statistics for the all-unigenes clustered by the unigenes 
from four sub-transcriptomes. 



database. Out of the 1 08 598 all-unigenes, 54 596 
(50.3%) showing significant similarity with proteins 
in the Nr database were assigned Nr annotations 
(Fig. 1). Overall, these all-unigenes matched 24 281 
unique protein accessions, which indicated that our 
sequencing project had generated a substantial 
number of genes for M. pinnata. Only 33.2% of the 
all-unigenes with a length <500 bp had BLAST hits 
in the Nr database. The proportion of all-unigenes 
with BLAST hits increased markedly for those with 



larger sizes (Fig. 2). It seemed that longer all-unigenes 
were more likely to have Nr annotations. The £-value 
distribution of the top hits in the Nr database also 
revealed that a larger proportion of all-unigenes 
longer than 500 bp had strong homology (£-value 
<1.0e-50) (Supplementary Fig. S3). 

The majority of these annotated all-unigenes 
(54 266 out of 54 596) displayed the highest hom- 
ology to genes from plants (Table 2). A quarter of 
them retrieved an annotation from Arabidopsis, 
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Length distribution of all-unigenes 



Figure 2. Length distribution of the all-unigenes with Nr 
annotations. In this study, only 33% of the all-unigenes shorter 
than 500 bp had BLAST hits in the Nr database, whereas more 
than 80% of the all-unigenes over 500 bp did. 



Table 2. Annotation sources for the all-unigenes of A/1, pinnata 



Annotation sources Number of matched all-unigenes 

Ten species with the most abundant matches 



Ambidopsis thaliana 


1 5 308 


Oryza sativa 


5739 


Glycine max 


51 98 


Ambidopsis lyrata 


4551 


Medicago truncatula 


441 4 


Populus tricbocarpa 


3463 


Vitis vinifera 


2491 


Zea mays 


1 067 


Lotus japonicus 


705 


Ricinus communis 


685 


Ten legume genus with the most abundant matches 


Glycine 


5220 


Medicago 


4722 


Lotus 


722 


Pisum 


683 


Phaseolus 


550 


Vigna 


312 


Cicer 


197 


Trifolium 


1 57 


Arachis 


1 52 


Vicia 


1 08 


nr_plants 


54 266 


nr_nonplants 


330 



'nr_plants' and 'nr_nonplants' indicate the annotation 
sources from the NCBI non-redundant sequences for 
plants and non-plant species, respectively. 



partly due to the fact that the Arabidopsis genome an- 
notation is the most advanced and complete for any 
higher plant to date in the Nr database. Besides, 
there were also a quarter of annotated all-unigenes 
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with matched accessions from its legume relatives 
(Table 2). The remaining <1% of annotated all- 
unigenes were annotated with sequences from 
the non-plant sources. Interestingly, we found 12 
all-unigenes with an Nr annotation from 
Phytophthora sojae, which is a plant pathogen that 
causes soybean stem and root rot. 40 Accordingly, we 
verified that all of these 12 all-unigenes were clus- 
tered by unigenes from the root-derived sub-tran- 
scriptome. This observation demonstrated a good 
representation of the comprehensive transcriptome 
in which even trivial amounts of pathogen transcripts 
were detected. 

Following their Nr annotations, we mapped the all- 
unigenes into the records of the GO database and 
retrieved GO annotations for 1 1 974 of them 
(Fig. 1). These all-unigenes were assigned GO terms 
from the three main categories, including 6407 all- 
unigenes with terms from 'Biological process', 7095 
all-unigenes with terms from 'Cellular component', 
and 8299 all-unigenes with terms from 'Molecular 
function' (Fig. 3). Among them, 3228 all-unigenes 
had an assignment in all three categories. The 
remaining all-unigenes failed to obtain a GO term, 
largely due to their uninformative (e.g. 'unknown', 
'putative', or 'hypothetical' protein) descriptions. 
Within the 'Biological process' category, the two 
most abundantly represented lineages were 'metabol- 
ic process' and 'cellular process'. There were also a 
large number of genes involved in 'biological regula- 
tion', 'pigmentation', 'localization', and 'response to 
stimulus'. In the 'Molecular function' category, 
'binding' and 'catalytic activity' were predominantly 
represented. The former was mainly represented by 
genes for 'nucleotide binding' and 'protein binding', 
while the latter was mainly represented by genes 
with 'transferase activity', 'hydrolase activity', and 
'kinase activity'. In the 'Cellular component' category, 
most of all-unigenes were located in 'cell part'. In con- 
trast, rare all-unigenes were sorted into 'extracellular 
region part' or 'virion part'. 

Finally, we performed the KEGG pathway analysis to 
assign biological pathways to the all-unigenes. In total, 
23 697 out of 108 598 (21.8%) all-unigenes were 
assigned 1 24 KEGG pathways (Fig. 1 ). These pathways 
belonged to 20 clades under five major KEGG cat- 
egories, including 'Metabolism', 'Genetic information 
processing', 'Environmental information processing', 
'Cellular processes', and 'Organismal systems' 
(Fig. 4). Among them, 'plant-pathogen interaction', 
'spliceosome', 'biosynthesis of phenylpropanoids', 
'biosynthesis of plant hormones', and 'ribosome' 
were the top five pathways most represented by the 
all-unigenes. These results suggested that the living 
of M. pinnata was featured by dynamic metabolism 
as well as active environmental adaptation. 
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Figure 3. GO classification of M. pinnata transcriptome. The results are summarized in three main categories as follows: biological process, 
cellular component, and molecular function. In total, 1 1 974 all-unigenes have been assigned GO terms. In some cases, one all-unigene 
has multiple terms. 
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Figure 4. Number of all-unigenes in each clade of the KEGG pathway maps. The all-unigenes were assigned 1 24 KEGG pathways within 20 
clades under five major categories: Metabolism (I), Genetic information processing (II), Environmental information processing (III), 
Cellular processes (IV), and Organismal systems (V). In each category, the clades are listed according to their abundancies of 
all-unigenes. 



3.3. Identification and functional classification 
of DEGs 

We singled out the DEGs with two criteria for their 
expression profiles: (i) the average fold change in 
gene expression level was more than or equal to 2- 
fold between the seawater- and freshwater-treated 
samples and (ii) the false discovery rate of the 
single-sample t-test was <0.1%. Under these criteria, 
23 81 5 out of 108 598 (21.9%) all-unigenes 



were identified to be differentially expressed in 
at least one tissue following the salt treatment 
(Supplementary Table S3; Fig. 1). In this study, DEGs 
with higher expression levels in seawater-treated 
samples when compared with freshwater-treated 
samples were denoted as 'up-regulated', while those 
with lower expression levels in seawater-treated 
samples were denoted as 'down-regulated'. Overall, 
there were much more DEGs in the roots than those 
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Figure 5. Number of DEGs in the root and the leaf of M. pinnata. 
The numbers of DEGs that were exclusively up- or down- 
regulated in one tissue are shown in each circle. The numbers 
of DEGs with a common or opposite tendency of expression 
changes between the two tissues are shown in the overlapping 
regions. The total numbers of up- or down-regulated genes in 
each tissue are shown outside the circles. 

in the leaves (Fig. 5). In the root, the number of up- 
regulated DEGs was less than that of down-regulated 
DEGs. The situation was reversed in the leaf, in 
which more up-regulated DEGs than down-regulated 
DEGs were identified. Only a small portion of DEGs 
(1050 up-regulated and 622 down-regulated) 
shared common tendency of expression changes 
between the roots and the leaves. A still less of DEGs 
showed a completely opposite tendency of expression 
changes between the two tissues (577 DEGs up- 
regulated in the leaf but down-regulated in the root, 
and 157 DEGs down-regulated in the leaf but 
up-regulated in the root). The remaining majority 
of DEGs were exclusively up-regulated or down- 
regulated in either tissue. 

To further characterize the expression changes in 
these two tissues, we conducted GO enrichment ana- 
lysis for the DEGs with the whole transcriptome set as 
the background. In the root, the significantly overre- 
presented GO terms of 'Biological process' were 
'gene expression', 'sulphur metabolic process', 're- 
sponse to oxidative stress', 'cellular amino acid deriva- 
tive metabolic process', 'oxidation reduction', and 
'secondary metabolic process' (Supplementary Table 
S4). In the leaf, the significantly overrepresented GO 
terms of 'Biological process' included 'oxidation reduc- 
tion', 'cellular amino acid derivative metabolic pro- 
cesses', and 'cellular aromatic compound metabolic 
process'. From these results, we noticed that both 
tissues of M. pinnata tend to mobilize genes related 
to oxidation reduction and cellular amino acid metab- 
olism. Furthermore, as the salt-induced deleterious 
effects varied among different parts of a plant, the 
roots and the leaves also activated some distinct 
groups of genes (i.e. genes in 'translation', 'sulphur 
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metabolic process', and 'response to oxidative stress' 
for the root, genes in 'phenylpropanoid metabolic 
process' for the leaf) to overcome them. 

The KEGG pathway enrichment analysis for the 
DEGs also revealed both common and tissue-specific 
patterns of overrepresentations. The DEGs were 
enriched in 3 7 pathways in the root and 31 pathways 
in the leaf (Supplementary Table S5). Among them, 
1 9 pathways appeared in both tissues and were 
mostly related to amino acid metabolism, lipid me- 
tabolism, energy metabolism, biosynthesis of second- 
ary metabolites, and environmental adaptation. 
Generally speaking, this finding demonstrated that 
both tissues of M. pinnata demand more energy and 
more biological substances to cope with salt stress. 
In addition to these common pathways, there were 
some tissue-specific pathways with overrepresented 
DEGs. For example, in the clade of 'Energy metabol- 
ism', the leaves of M. pinnata were enriched with 
DEGs in 'oxidative phosphorylation' and 'photosyn- 
thesis' pathways, and these processes were also 
employed by the leaves of other plant species under 
salt stress. 41-43 On the other hand, the roots of M. 
pinnata were enriched with DEGs in the 'sulphur me- 
tabolism' pathway, which was likely related to their 
sulphur-rich living conditions. 44 



3.4. Experimental verification of DEGs 

To assess the reliability of our sequencing-based ap- 
proach in identifying salt-responsive genes, we moni- 
tored the expression of candidate DEGs. In the first 
place, we performed RT-PCR for 100 randomly 
chosen candidates with specific primers 
(Supplementary Table S6). We compared gene expres- 
sion levels at the three time points separately 
between stressed and control samples. The expression 
profiles of 70 candidates were basically in agreement 
with the predictions from the lllumina sequencing 
results. Six candidates displayed incongruent expres- 
sion profiles with the predictions. The remaining 24 
candidates showed non-specifically amplified pro- 
ducts or no products. In other words, 70-92% of 
the candidates were confirmed by RT-PCR analyses. 
The confirmed candidates were differentially 
expressed in at least one of the three time points 
(Supplementary Fig. S4). In the root, the maximal 
number of confirmed DEGs occurred at the 4 h time 
point (Supplementary Table S7). Conversely, the 
minimal number of confirmed DEGs in the leaf oc- 
curred at the 4 h time point, whereas the maximum 
occurred at the 8 h time point. From these results, 
we can infer a delay of transcriptional response to 
salt stress in the leaf, which may be explained by a 
later salt accumulation in this tissue. 9 
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Figure 6. Relative expression levels of 20 DEGs at a series of time points following the seawater treatments. The relative gene expression 



levels as expressed by 2 



were determined separately for each tissue and were presented as the mean + SD. Significant 



differences (P< 0.01 , LSD-ttest) between the expression levels at 1 , 2, 4, and 8 h time points and those at 0 h are indicated by asterisks. 



In the second place, we employed real-time PCR to 
delineate the detailed expression profiles of 20 candi- 
date DEGs. These candidates mainly included genes 
that were reported to be related to salt stress response 
in other plant species, such as genes involved in 
several signal transduction pathways, genes associated 
with osmotic adjustments or detoxification of reactive 
oxygen species (ROS), and genes encoding transcrip- 
tion factors (Supplementary Table S8). Besides, there 
were three candidates with unknown functions, 
which represented a new resource for gene transfer 
from M. pinnata to its legume relatives or other crops. 

All of these 2 0 candidates showed significantly dif- 
ferential expression with the original expression level 
in at least one time point following the seawater 



treatments (Fig. 6). In other words, all candidates dis- 
played an expression pattern roughly consistent with 
the prediction based on the lllumina sequencing 
results. In most cases, the expression changes were 
more dramatic in the roots than in the leaves. 
Besides, most of the candidates were up-regulated 
from the 0 to 4 h time point, and then they gradually 
restored to the original expression levels. The pace of 
the rise and fall in gene expression levels was 
roughly coincident with the wilting and recovering 
of the young leaves. This phenomenon was also 
observed in other mangroves, 42 and could be 
explained by referring to the different phases of salt 
stress response. 43 The up-regulation and restoration 
of the DEGs in both tissues may parallel the phases 



204 



Transcriptome Sequencing and Analysis of M. pinnata 



[Vol. 1 9, 



of 'salt accumulation' and 'osmotic recovery', and the 
delay in gene activation in the leaves may result from 
an extra phase of 'dehydration' preceding the above 
two phases. The leaves wilted during the 'dehydration' 
phase, and then revived gradually during the follow- 
ing two phases. In this sense, these results again 
demonstrated that a later salt accumulation was re- 
sponsible for the delay in the transcriptional response 
in the leaves. 



4. Discussion 

Mangroves are a group of plant species that live 
along the sea coast with the ability to tolerate high 
salinity. 44 The extremely high salinity within their 
living conditions confers mangroves unique resources 
for exploiting the dynamic nature of their gene regu- 
lation during salt exposure. While the collections of 
ESTs based on cDNA libraries from several mangrove 
species have been described, 45-47 the transcript 
data for such a plant group containing diverse 
species are still scarce. In a recent study using the 
454 pyrosequencing platform, 67 375 and 96 989 
unisequences with a mean length of 360 and 
433 bp were produced for two taxonomically 
diverse mangroves. 48 Compared with these results, 
our transcriptome for M. pinnata exceed in both the 
total number and the mean length of the unise- 
quences. The mean length of the all-unigenes in our 
transcriptome was also longer than those reported 
for other plant transcriptomics using either the 
lllumina sequencing technology 49,50 or the 454 pyro- 
sequencing technology. 51,52 

The enhancement in the gene length of our tran- 
scriptome may be attributed to various factors, such 
as the optimized assembly parameters (e.g. the K- 
mer size). One of the most important reasons for 
this enhancement might be that the final compre- 
hensive transcriptome was indeed integrated by four 
sub-transcriptomes, each of which was comparable 
in gene number and gene length with those in the 
previous studies. The final unisequences possessed a 
relatively high depth of average coverage (~ 3 0-fold) 
by reads from each sub-transcriptome. We also 
noticed an increase in the number of unisequences 
from the sub-transcriptome to the comprehensive 
transcriptome. The 'extra' unisequences represented 
either splice variants or genes with tissue-specific or 
stress-specific expression. So far, our transcriptome 
for M. pinnata has presented many more sequences 
than the sequencing efforts for any other mangrove 
species have produced. 53 

As for the GO and KEGG annotations, the aforemen- 
tioned two mangrove species displayed remarkable 
similarities to each other. 48 They both showed a lack 



of representation in the GO category 'response to 
stimulus', whereas the transcriptome of ISA. pinnata 
was well represented in this category. We suggest 
that this discrepancy may be partly attributed to the 
different status of adaptation for each mangrove 
species. It has been hypothesized that semi- 
mangroves represent an intermediate during the 
long-term evolution of mangroves in adaptation to 
the high-salinity environments. 54 Accordingly, while 
those two mangroves that were 'completely' adapted 
to the extreme conditions did not require so many 
transcripts for responding to stimulus, M. pinnata 
with a 'half tolerance may still be in great need 
of these transcripts. With the exception of this 
discrepancy, the functional categories were similar 
between M. pinnata and those two mangroves. 
For example, all these three species possessed 
large quantities of transcripts related to energy 
metabolism. 

Based on the comparison of gene expression levels 
between stressed and control samples using RPKM 
values, we screened out 23 81 5 all-unigenes as candi- 
date salt-responsive genes, which took up ~22% of 
the total all-unigenes. This percentage was higher 
than the percentages of salt-induced DEGs derived 
from other related species, such as Bruguiera gymnor- 
rhiza (another mangrove) and Medicago truncatula 
(another legume), using the array-based plat- 
form. 41,55 First, the differences in DEG percentages 
might reflect the inherent differences between 
species in their transcriptome profiles. Secondly, it 
might reflect a greater sensitivity and higher reso- 
lution of the sequencing-based platform in identifying 
DEGs. The sequencing-based approaches offer several 
advantages over the existing hybridization-based 
approaches, which often suffer from background 
and cross-hybridization problems. 56 Thirdly, since 
each sample for lllumina sequencing was pooled by 
equal amounts of RNAs from 2, 4, and 8 h-treated 
plants, the RPKM values in each sub-transcriptome ac- 
tually reflected the average gene expression levels for 
the three time points. Theoretically speaking, our 
methods would miss some genes with 2-fold differ- 
ence in expression levels between stressed and 
control samples at less than two time points due to 
the diluting effect of the pooled samples. 
Nevertheless, it could help us to detect genes with 
more dramatic changes (4-fold or greater) in expres- 
sion levels at even only one time point, which might 
be more closely related to salt stress response. Lastly, 
we still should not neglect the possibilities of falsely 
discovered DEGs through the sequence-based plat- 
form. In our study, both RT-PCR and real-time PCR 
analyses demonstrated a high reliability of the candi- 
date salt-responsive genes identified by sequencing- 
based approach. 
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The DEGs identified in M. pinnata were enriched in 
the pathways related to protein synthesis and biosyn- 
thesis of secondary metabolites. A comparative study 
between glycophyte and halophyte showed that 
Arabidopsis adopted a global defence strategy that 
requires bulk protein synthesis, whereas salt cress 
mainly activated genes that functioned in protein 
modification and redistribution. 16 It seemed that 
protein modification might be a faster mechanism 
for stress response than protein synthesis. In this 
aspect, M. pinnata was more similar to glycophytes. 
On the other hand, salt cress showed a salt-induced 
increase in secondary metabolites, 1 6 which were 
also found to accumulate in M. pinnata but not in 
Arabidopsis. Taken together, these two results 
suggest that M. pinnata can utilize the adjustment 
mechanisms preferentially adopted by either glyco- 
phytes or halophytes. 

In conclusion, we characterized a comprehensive 
transcriptome of a semi-mangrove plant M. pinnata 
using the lllumina sequencing technology. This tran- 
scriptome will not only provide plenty of informative 
data, but also expand the vision of the genetic basis 
underlying adaptation for the mangrove community. 
The candidate salt-responsive genes identified in M. 
pinnata contain both the previously reported salt-re- 
sponsive genes and some species-specific genes 
which will be a new resource for molecular breeding 
in legumes or other crops. The functional enrichment 
analyses for the candidate genes showed both 
common and tissue-specific patterns of transcriptome 
remodelling under salt treatments. Moreover, the ex- 
perimental analyses for some candidate genes pro- 
vided some evidences for a delay in transcriptional 
responses in the leaves. Our work has demonstrated 
a high reliability of the sequencing-based approach 
in identifying stress-responsive genes in non-model 
organisms and established a connection between gly- 
cophytes and halophytes in the views of transcrip- 
tomic adjustments to salt stress. 

Supplementary Data: Supplementary Data are 
available at www.dnaresearch.oxfordjournals.org. 
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